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Abstract 

Program verification using Hoare-style techniques requires many 
logical annotations. We have previously developed a generic anno- 
tation inference algorithm that weaves in all annotations required 
to certify safety properties for automatically generated code. It us- 
es patterns to capture generator- and property-specific code idioms 
and property-specific meta-program fragments to construct the an- 
notations. The algorithm is customized by specifying the code pat- 
terns and integrating them with the meta-program fragments for 
annotation construction. However, this is difficult since it involves 
tedious and error-prone low-level term manipulations. 

Here, we describe an annotation schema compiler that large- 
ly automates this customization task using generative techniques. 
It takes a collection of high-level declarative annotation schemas 
tailored towards a specific code generator and safety property, and 
generates all customized analysis functions and glue code required 
for interfacing with the generic algorithm core, thus effectively cre- 
ating a customized annotation inference algorithm. The compiler 
raises the level of abstraction and simplifies schema development 
and maintenance. It also takes care of some more routine aspects 
of formulating patterns and schemas, in particular handling of ir- 
relevant program fragments and irrelevant variance in the program 
structure, which reduces the size, complexity, and number of differ- 
ent patterns and annotation schemas that are required. The improve- 
ments described here make it easier and faster to customize the 
system to a new safety property or a new generator, and we demon- 
strate this by customizing it to certify frame safety of space flight 
navigation code that was automatically generated front Simulink 
models by Math Works’ Real-Time Workshop. 

1. Introduction 

The verification of program safety and correctness using Hoare- 
style techniques requires many logical annotations (principally 
loop invariants, but also pre- and post-conditions) that must be 
woven into the program. These annotations constitute cross-cutting 
concerns, which makes their construction difficult and expensive. 
For example, verifying even a single array access safe may need 
annotations throughout the entire program to ensure that all the 
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information about the array and the indexing expression that is 
required for the proof is available at the access location. 

However, in certain cases it is possible to construct the required 
annotations automatically, e.g., if the program comes from a limit- 
ed domain [12] or if only limited properties are shown [10]. In our 
previous work [6], we have developed a generic annotation infer- 
ence algorithm that exploits the idiomatic structure of automatical- 
ly generated code to weave in the annotations required to verify a 
given safety property. Idioms are recurring code patterns that solve 
similar programming tasks using similar constructions. In automat- 
ically generated code, they result from the way generators usually 
derive code, i.e., by combining a finite number of building blocks 
(e.g., templates) following a finite number of combination meth- 
ods (e.g., template expansion). For example. Figure 1 shows three 
matrix initialization idioms employed by Real-Time Workshop; the 
code in Figure 1(c) uses a vector to represent the matrix. 


A| 

;o, 0 ] : 

= ao,o ; 

for i 

: = 0 to n do 

for i : 

= 0 to 7! 1 do 




for 

j : = 0 to m do 

for j 

: = 0 to m do 

A| 

; 0 ,m \ : 

= <2.0 , m } 

A 

[i, j ] : = a; 

A [ 

i* n + j ] : = a, 

A | 

:i,o ] : 

= ai,o; 





A | 

n, m] : 

= <2n,m } 






(a) 



(b) 


(c) 


Figure 1. Idiomatic matrix initializations in Real-Time Workshop 

Our inference algorithm uses patterns to capture these generator- 
and property-specific code idioms and property-specific meta- 
program fragments associated with these patterns to construct the 
annotations. It first builds an abstracted control-flow graph (CFG), 
using the patterns to collapse the code idioms into single nodes. 
It then traverses this graph and follows all paths from use-nodes 
backwards to all corresponding definitions, adding the annotations 
along the way. This algorithm is implemented as part of our AUTO- 
CERT system for the safety certification of automatically generated 
code. Its core (i.e., CFG construction and transversal) is fully gener- 
ic but it must be customized for a given code generator and safety 
property by specifying the code patterns and integrating them with 
the implementation of the meta-program fragments for annotation 
construction. However, while the former part can build on a clean, 
declarative pattern language, the latter part has so far involved te- 
dious and error-prone low-level term and program manipulations. 

Here, we describe the AutoCert annotation schema compiler 
that largely automates this customization task. It takes a collection 
of annotation schemas tailored towards a specific code generator 
and safety property, and generates all glue code required for inter- 
facing with the generic algorithm core, thus effectively creating a 
customized annotation inference algorithm. The compiler allows us 



to represent all knowledge required to handle a class of specific cer- 
tification situations declaratively and in one central location (i.e., in 
the annotation schemas), which raises the level of abstraction and 
simplifies development and maintenance. It also takes care of some 
more routine aspects of formulating patterns and schemas, in par- 
ticular handling of irrelevant program fragments (“junk”) and irrel- 
evant variance in the program structure (e.g., the order of branch- 
es in conditionals), which reduces the size, complexity, and num- 
ber of different patterns and annotation schemas that are required. 
Together with improvements of the underlying core inference al- 
gorithm and the pattern matching machine, the schema compiler 
makes it much easier and faster to customize the generic annota- 
tion inference algorithm to a new safety property or a new genera- 
tor. We demonstrate this by customizing it to certify frame safety of 
space flight navigation code that was automatically generated from 
Simulink models by Real-Time Workshop [ 1 ]. 

In this paper, we thus build on but substantially improve over 
our previous work on annotation inference for automatically gen- 
erated code [6], Our main technical contributions are the develop- 
ment of the schema compiler and the implicit junk handling by the 
compiler. We have also modified underlying core inference algo- 
rithm so that the inference for one variable can “restart” the algo- 
rithm for other variables if the safety of the former depends on the 
latter. This dependency is also controlled by the schemas. Finally, 
we have extended the pattern language by additional constraint op- 
erators, which makes it more expressive and allows more context- 
sensitivity in the patterns, thus minimizing reliance on the use of 
arbitrary meta-programming functionality in the guards. In particu- 
lar, we have integrated a simple data-flow analysis into the matcher, 
which allows us to match a pattern against the content of a variable 
as well. This significantly improves our ability to distinguish struc- 
turally equivalent code fragments. Our main empirical contribution 
here is a significantly extended evaluation of our general annotation 
inference approach. In particular, we have evaluated AutoCert 
using C code generated by the Real-Time Workshop code gener- 
ator. Based on the extensions described here, we have been able 
to certify frame and initialization safety for code generated from 
Simulink and Embedded Matlab models, as well as several safety 
properties for a variety of programs generated by our AutoBayes 
[ 9] and AutoFilter [23] generators. 

The next section gives some general background on the safety 
certification of automatically generated code and summarizes the 
underlying annotation inference algorithm as far as is required here; 
more details can be found in [6], Section 3 explains the extended 
pattern language used here. Section 4 contains a description of 
the different aspects of the annotation schema compiler, while 
Section 5 focuses on the practical experience we have gained so 
far. The final two sections discuss related work and conclude with 
an outlook on future work. 

2. Technical Background 

Here, we briefly summarize our approach to safety certification of 
automatically generated code and the generic annotation inference 
algorithm. More details can be found in our previous work [4, 5, 6], 

2.1 Safety Certification 

Program Safety Safety certification demonstrates that a program 
does not violate certain conditions during its execution. A safety 
property [4] is an exact semantic characterization of these condi- 
tions, while a safety policy is a set of Hoare rules designed to show 
that a program satisfies the safety property of interest. Language- 
specific properties can be applied to all programs in the underly- 
ing programming language. For example, variable initialization be- 
fore use (init) ensures that each variable or individual array element 


has been explicitly assigned a value before it is used, while array 
bounds safety (array), requires each access to an array element to 
be within the specified upper and lower bounds of the array. Our ap- 
proach can also be used with more specific domain-specific proper- 
ties. For example, frame safety (frame ) shows that navigation soft- 
ware uses the different frames of reference consistently [14, 19]. 
Annotation and Verification We split certification into an un- 
trusted annotation construction phase (see below for details) and 
a simpler but trusted verification phase, where the standard ma- 
chinery of a verification condition generator (VCG) and automated 
theorem prover (ATP) is used to fully automatically prove that the 
code satisfies the required properties. As usual in Hoare-style ver- 
ification, a VCG traverses annotated code and applies the calculus 
rules of the safety policy to produce verification conditions (VCs). 
These are then simplified, completed by an axiomatization of the 
relevant background theory and passed to an off-the-shelf ATP. If 
all VCs are proven, we can conclude that the program is safe with 
respect to the safety policy, and. given the policy is sound, also the 
safety property. Note that the annotations are required as “hints” 
or lemmas for the ATP, and must be established in their own right. 
Consequently, they remain untrusted — a wrong annotation cannot 
compromise the assurance provided by the system. 

2.2 Idioms 

The idioms used by a code generator are essential to our approach 
because they (rather than the generator’s building blocks or combi- 
nation methods) determine the interface between the generator and 
the inference algorithm. The idioms and corresponding patterns are 
specific to the given safety property, but the inference algorithm re- 
mains the same for each property. This allows us to apply our tech- 
nique to black-box generators as well, as the example of Real-Time 
Workshop shows. Moreover, it also allows us to handle optimiza- 
tions: as long as the resulting code remains idiomatic, neither the 
specific optimizations nor their order matter. We can thus customize 
a verifier for a given generator and safety property, by identifying 
the relevant idioms and formalizing them as patterns. 

The idioms represent the key knowledge that drives the annota- 
tion inference. However, we need to distinguish different classes of 
idioms, in particular, definitions, uses, and barriers. Definitions es- 
tablish the safety property of interest for a given variable, while us- 
es refer to locations where the property is required. Barriers repre- 
sent any statements that appear between definitions and uses (in the 
control flow graph) that require annotations, i.e., principally loops. 
In the case of initialization and frame safety, the definitions are the 
different initialization blocks, while the uses are statements which 
read a variable (i.e., contain an r\>ar). In the case of array bounds 
safety, the definitions correspond to fragments which set the val- 
ues of array indices, while the uses are statements which access an 
array variable. In all cases, barriers are loops. 

2.3 Inference Algorithm Structure 

The inference algorithm itself is then based on two related key ob- 
servations. First, it is sufficient to annotate only in reverse along 
all CFG-paths between uses (where the property is required) and 
definitions (where it is established). Second, along each path it 
is sufficient to annotate only with the definition’s post-condition, 
or more precisely, the definition’s post-condition under the weak- 
est pre-condition transformation that is implemented in the VCG. 
which corresponds to the safety condition which must hold at that 
point in the code. 

The inference algorithm builds and traverses the CFG and re- 
turns the overall result by side-effects on the underlying program 
P. It reduces the inference efforts by limiting the analysis to cer- 
tain program hot spots which are determined by the so-called “hot 
variables” and “hot uses” described in [6], Note that the hot vari- 



ables are computed before the graph construction (and thus before 
the actual annotation phase), in order to minimize the work in the 
subsequent stages. For each hot variable the algorithm then com- 
putes the CFG and iterates over all paths in the CFG that start with 
a hot use, before it finally constructs the annotations for the paths. 
Abstracted Control Flow Graphs The algorithm follows the 
control flow paths from variable use nodes backwards to all cor- 
responding definitions and annotates the barrier statements along 
these paths as required (see below for details). The CFGs are ab- 
stracted by collapsing entire code idioms matching specific pat- 
terns into individual nodes. Since the patterns can be parameterized 
over the hot variables, separate abstracted CFGs are constructed for 
each given hot variable. The construction is based on a straightfor- 
ward syntax-directed algorithm as for example described in [11]. 1 
The only variation is that the algorithm first matches the program 
against the different patterns, and in the case of a match constructs 
a single node of the class corresponding to the successful pat- 
tern, rather than using the standard construction and recursively 
descending into the statements sub terms. 

In addition to basic-nodes representing the different statement 
types of the programming language, the abstracted CFG can thus 
contain nodes of the different pattern classes. The algorithm is 
based on the notions of the use- and definition- nodes and uses ba- 
rrier-, barrier-block- and block- nodes as optimizations. The latter 
three represent code chunks that the algorithm regards as opaque (to 
different degrees) because they contain no definition for the given 
variable. They can therefore be treated as atomic nodes for the pur- 
pose of path search, which drastically reduces the number of paths 
that need be explored. 

Annotation of Paths For each hot use of a hot variable, the path 
computation returns a list of paths to putative definitions. They have 
been identified by successful matches, but without the safety proof 
we cannot tell which, if any, of the definitions are relevant. In fact, it 
may be that several separate definitions are needed to fully define a 
variable for a single use. Consequently, all paths must be annotated. 

Paths are annotated in two stages. First, unless it has already 
been done during a previous path, the definition at the end of the 
path is annotated. Second, the definition’s post-condition (which 
has to hold at the use location and along the path as well) is taken as 
the initial annotation and propagated back along the path from the 
use to the definition. Since this must take computations and control 
flow into account, the current annotation is updated as the weakest 
pre-condition of the previous annotation. Both the computation of 
pre-conditions and the insertion of annotations are done node by 
node rather than statement by statement. 

Annotation of Nodes The path traversal described above calls 
the actual annotation routines (whether implemented manually or 
generated from the annotation schemas) when it needs to annotate 
a node. Three classes of nodes need to be annotated: definitions, 
barriers, and basic nodes which are also loops. However, the most 
important (and interesting) class is the definitions because their 
annotations (more precisely, their final post-conditions) are used 
as initial values for annotation along the paths. 

For example, we can define a separate annotation schema for 
each of the three different initialization blocks shown in Figure 1 . 
Each schema inserts a final (outer) post-condition establishing that 
the matrix x is initialized, e.g.. in the first two cases V 0 < i < 
N,0<j< M ■ A init [j, j } = init. 

However, the annotations also need to maintain the “internal” 
flow of information within a definition. Hence, the schemas dealing 

1 Since the generators only produce well- structured programs, a syntax- 
directed graph construction is sufficient. However, we could, if necessary, 
replace the graph construction algorithm by a more general version that can 
handle ill-structured programs with arbitrary jumps. 
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Figure 2. Grammar of extended pattern language 


with the situations shown in Figure 1(b) and 1(c) also need to insert 
an inner post-condition, as well as inner and outer loop invariants. 

Note that even after a pattern has been successfully matched, 
the annotation schema itself might still fail. For example, the pat- 
tern in the schema handling the idiom in Figure 1 (a) simply match- 
es against a sequence of assignments, but the schema requires that 
the indices of the first and last assignments are the lower and upper 
bound of the array, respectively. Of course, even if the schema suc- 
ceeds, the generated VCs might fail since annotation construction is 
untrusted. In other words, matching is approximate, but ultimately 
checked by the proven 

3. Extended Pattern Language 

The annotation inference algorithm uses patterns to capture the 
idiomatic code structures and pattern matching to find the corre- 
sponding code fragments and build the CFG. The pattern language 
is essentially a tree-based regular expression language similar to 
XML-based languages like XPath [2] ; Figure 2 shows its grammar. 
Compared to [6], we added more contextual patterns (the looka- 
head operators / / and \\ and the outside-operator <t), operators 
to support interactions with the meta-program fragments construct- 
ing the actual annotations, and constraints ( : : ), in particular the 
data-flow lookback operator ~= . 

Core Patterns The language supports matching of tree literals 
/(Pi, . . . P„) over a given signature £, wildcards ( _ ) and the 
usual regular operators for optional (?), list (*) and non-empty 
list (+) patterns, as well as alternation ( II ) and concatenation 
( ; ) operators. The ellipsis operator . . . the concise formulation 
of enumerations. Pi ... P 2 is compiled into Pi; F* ; Pi, where 
P = lcs(Pi , Pi) is the least common subsumer (or anti-unifier) 
of Pi and P 2 . This is computed by replacing any two different 
subterms at corresponding positions in the two terms by a fresh 
variable. -3- is a committed choice operator, which is similar to 
alternation, but tries the alternatives in a left-to-right order, and 
commits to the first match, i.e., does not backtrack into the other 
alternatives. 

Context Dependencies Unlike a “pure” regular expression lan- 
guage, our pattern language allows us, to some limited degree, to 
express context dependencies. This can be achieved by two dif- 
ferent mechanisms, contextual patterns and pattern meta- variables. 
Contextual patterns generalize the idea of lookahead that is well- 



known from regular expression matching. A contextual pattern 
Pi op P 2 consists of a base pattern Pi that must be matched against 
the input, and will eventually be returned as match result, and a 
context pattern P 2 that can rule out potential base matches, de- 
pending on the given context operator op. Possible operators are 
lookahead (//) and its complement (i.e.. Pi \\ P 2 matches if Pi 
is not followed by P 2 ), which check the right siblings of the term 
matched against the base pattern (i.e., work horizontally), and var- 
ious forms of subterm matching, which check its descendants and 
ancestors (i.e., work vertically). Hence, Pi 3 P 2 matches all terms 
that match Pi and have at last one subterm that matches P 2 ; simi- 
larly, Pi ~t> P 2 matches all terms that match Pi and have no subtemi 
that matches P 2. 2 In contrast to the inward-looking operators 3 
and , the -operator looks outward: Pi t- Pi checks for instances 
of Pi which are not within any enclosing occurrence of P 2 . This 
has proved very useful to rule out accidental matches. Uninstanti- 
ated pattern meta-variables match any term but, unlike a wildcard, 
they then become instantiated with the matched term and subse- 
quently match only against further instances of the first match. For 
example, the pattern (_ [_] : = _)+ matches the entire statement list 
A [ 1 ] : =1 ; A [ 2 ] : =2 ; B [ 1 ] : =1 while the pattern (®[_] : = _) + 
matches only the two assignments to A but not the final assignment 
to B, due to the instantiation of x with A. 

Interaction Patterns Another extension of the pattern language 
describes interactions with the meta-program fragments construct- 
ing the actual annotations. The two operators <- and 0 are used to 
compile the guards and actions of the corresponding schema. The 
weave-operation P «- U executes an update action U on the pro- 
gram fragment matched against P when the annotation schema is 
applied, and thus weaves in the annotation. U can be an arbitrary 
meta-program operation prim-op of type Te — > Te, but typical- 
ly it just adds a list of annotations to the target fragment, and we 
provide two built-in operations for this case. & (A) simply adds 
the annotations A to the target fragment, while &&(A) recursively 
adds A to all barriers inside the target fragment. This is mostly used 
for the junk handling described in Section 4.3. In both cases, anno- 
tations are simply formulas F £ T. labeled with their purpose as 
invariant, pre- or post-condition. The access-operator P 0 x binds 
the meta- variable x to the term matched against P , so that it can be 
referred to in the guards and actions. This is similar to the use of 
pattern meta-variables, but allows P to be further instantiated. 
Constraints Constraints are similar to contextual patterns in the 
sense that the base pattern P will be returned as result only if 
the constraint C is satisfied. C is either a data-flow lookback (see 
below), or an arbitrary meta-program operation. These can for 
example be used to check structural properties of the match that 
cannot be expressed in the pattern language, e.g., identical lengths 
of two different lists. 

Data-flow Lookback Since pattern matching works on the syn- 
tactic structure of the program, all relevant semantic differences 
must be reflected syntactically. However, in practice, this is often 
not the case. For example, in navigation software transformations 
between different frames of reference can be represented by a di- 
rection cosine matrix (DCM) [19]; Figure 3 shows the structure of 
two example DCMs transforming from the NED frame into two 
different target frames. 

For the certification frame-safety we need to be able to distinguish 
between the two different DCMs, but the code generated by Real- 
Time Workshop uses temporary variables to store the elements, and 
the matrix (represented as a vector) is updated using these (see 
Figure 4(a)). Note that additional temporaries are used to factor 
out common subexpressions. In order to identify the sequence of 
array updates as the DCM-NED-to-ECEF idiom, and to distinguish 

2 In [ 6 ], these were denoted by P2 € /) and P2 f- Pi , respectively. 
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Figure 3. DCM matrices: (a) NED-to-ECEF (b) NED-to-NAV 


it from the structurally equivalent DCM-NED-to-NAV idiom, we 
thus need to match the content of the variables vO to v8 (and thus 
the content of the meta-variables xo to xs) against the respective 
patterns. 


cO :=-l 

w0 : =cos (in5) 
wl : =sin ( in4 ) 
w2 : =sin ( in5 ) 

vO : =c0*w0*wl ; 
vl : =c0*wl*w2 ; 


v8 : =c0*wl ; 


a [0] 

: =v0 ; 

(A[0]:=.r 0 ) : 
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: =vl ; 
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: X8 

— sin(P) 
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Figure 4. DCM-NED-to-ECEF code fragment (a) and pattern (b) 

Rather than using arbitrary meta-programs to analyze the pro- 
gram structure, we introduce a specific constraint operator that trig- 
gers a simple, approximate data-flow analysis to infer possible sym- 
bolic values of program variables that are then checked against the 
constraint pattern. Figure 4(b) shows the actual pattern used to cap- 
ture the idiom. The schema, itself, is shown in Section 5.2. The 
structural core of the pattern is simply the sequence of array up- 
dates. but each right-hand side is constrained by an appropriate 
lookback. When an update such as a [ 0 ] :=v0 is matched, the 
data-flow analysis looks back through the program to find possi- 
ble values for vO. The preceding assignment yields cO*wO*x2 
for which a match is attempted against the constraining pattern 
— cos(L ) * sin(P). This attempt fails, which triggers further look- 
backs to values of the variables occurring in the value found for 
the original variable vO, i.e., cO, wO, and wl. Using the theory 
matching described below, the lookback eventually succeeds, with 
the meta- variables L and P instantiated with in5 and in4, re- 
spectively. Note that a “plain” lookback (i.e., a reverse lookahead) 
would remain insufficient in such situations, since the required val- 
ue of xo is only constructed in several steps and several locations. 

The dataflow lookback is only an approximation, since it ig- 
nores control flow predicates and CFG back edges. However, this 
approximation remains safe, as all matches are checked by the VCs 
and thus ultimately by the ATR 

Match procedure The match procedure traverses terms first top- 
down and then left-to-right over the direct subterms, returning as 
result triples where the first two arguments are the root position 
and length of the match of the top-level pattern, and the third is 
a substitution with bindings for the pattern meta- variables. The 
meta-variables are instantiated eagerly (i.e., as close to the root as 
possible) but instantiations are undone if the enclosing pattern fails 
later on. List patterns follow the usual “longest match” strategy 
used in traditional regular expression matching. Lookahead and 
subtemi matching are implemented in a straightforward way, but 



the performance of the pattern matcher has been sufficient so far. 
Constraints are checked whenever a match for the base pattern has 
been found. However, the dataflow lookback requires interaction 
with the CFG construction and the term traversal, as traversed terms 
need to be pushed on a stack for later inspection. 

The match procedure also supports a limited form of matching 
modulo theory: users can specify how tree literal patterns can be 
mapped onto terms. We use this to handle some irrelevant syntactic 
variance in the programs, for example, to handle commutative 
operators such as addition and multiplication, or to identify block 
patterns of the form {*;P; *} with single statements matching 
P. This feature has proved very useful, but it has to be used with 
care, since the indiscriminate use of such mappings can increase 
the search space for matching substantially and can also lead to 
unintended matches and hence a loss of control. 

4. Annotation Schema Representation and 
Compilation 

An annotation schema is a declarative representation of the knowl- 
edge required to handle a class of specific certification situations. 
Its main components are a code pattern that describes both the 
structure of the object program fragments to which the schema is 
applicable and where the annotations will be added, and two lists 
of run-time guards and actions that will be first executed when the 
pattern is matched against the object program, and then used to 
compute the actual annotations that are added. In practice, guards 
are rarely required, and none of the schemas shown here uses them. 

The AutoCert annotation schema compiler takes a collection 
of annotation schemas tailored towards a specific code generator 
and safety property, and compiles it down into a customized an- 
notation inference algorithm. Since we are reusing AutoCert’s 
core annotation inference algorithm outlined above and described 
in more detail in [6], which is implemented in Prolog, the output of 
the compiler is simply a set of Prolog clauses. 

4.1 Schema Representation 

An annotation schema bundles together all knowledge that is re- 
quired by the annotation inference algorithm to handle a class of 
specific certification situations. In addition to the pattern and the 
run-time guards and actions this also includes the safety policy or 
policies under which the schema is applicable and the node class 
that will be attached to the matched object program fragments. 
Since the schema compiler is implemented in Prolog, we simply 
represent schemas by Prolog facts or clauses. This allows us to use 
arbitrary Prolog code as compile-time guards and actions to the 
schemas and thus to further simplify their formalization. In the ex- 
ample shown in Figure 5, 3 which comes from Simulink, we can 
thus use the same schema (with appropriately parameterized ac- 
tions) for two of the different safety properties init and range (a 
vector satisfies range (dim(A, N)) if all its entries are within the 
bounds of the TVth dimension of array A), although we concen- 
trate on init here. The schema clauses also contain some addition- 
al information that is used by the schema compiler, namely the 
schema name (for reference purposes), and the name of a pattern 
pre-processing predicate (here default) which can be used to 
simplify the description of the patterns and the advice. See Sec- 
tion 4.3 for details. 

The for.assign schema is designed to annotate loops that 
initialize arrays element by element. For example, in order to facil- 
itate a proof that 


3 Here, and in the rest of the paper, we type-set the patterns using concrete 
syntax to improve the legibility of the schemas. Our implementation uses 
standard Prolog terms. 


schema (for_assign 
, SP 

, def (A) 

, (for (/ := _ to _)@ Index do 

/; /; *1 /])<§ Al := . 1> A) <- &(post 50 
) *- &(inv Inv, post Post ) 

, default 

, [] 

, [safe (SP, AI, SC), 

ind.schema ( stepl , SC, [Index], [lnv,Post])] 
) SP=init ; SP = range (_) . 


Figure 5. Annotation schema for_assign 


for i := 1 to N do 
a[i] := b[i]; 

actually initializes the array a, the schema needs to construct an 
appropriate loop invariant and post-condition, resulting in the an- 
notated loop 

for i := 1 to N inv V 1 < j < i • owly] = init do 
a[i] :=b[i]; 

post a im ,[i] — init 

post VI < j < N • aimtlj] = init 

The first step in designing this schema is to specify the core 
pattern that will be used to identify instances of the general loop 
structure in the program. Here we are looking for single for-loops 
with arbitrary lower and upper bounds, where the loop body con- 
sists of an update of an arbitrary array A , in which the loop’s index 
variable I is used as index. We allow additional indices left and 
right of /, provided they contain no further occurrences of / (thus 
restricting the schema to arrays effectively used as vectors), and 
require that the right-hand side of the assignment contains no fur- 
ther occurrences of the array A that is being initialized. This can be 
expressed concisely in our pattern language: 

for / := _ to _ do 

A[*2/; I; *2/] := _ 7* A 

The second step is to add <--operations to splice the construct- 
ed annotations into the appropriate locations. As outlined above, 
we need an invariant Inv and post-condition Post on the loop itself. 
However, we also need to specify the post-condition on the indi- 
vidual array-update, which will be used to prove the loop’s post- 
condition. This yields 

(for / := _ to _ do 

(A[*2* I ; I ; /] := _ ^ A) «- &(post PostAI ) 

) *- &(inv Inv, post Post) 

Since this schema requires no guards, the final step is to add the 
actions that actually construct the annotations. Here, the actions 
consists of calls to the safety predicate safe and the annotation 
construction predicate ind.schema (see Section 4.2) to construct 
the post-condition for a single array-update and the loop invariant 
and post-condition, respectively. The predicates require access to 
specific parts of the actual program fragment matched against the 
pattern, e.g., the complete left-hand side of the array-update. Since 
this is not bound by a pattern meta-variable — note that A only 
contains the name of the array, not the entire access — the pattern 
used in the schema contains additional variables like AI that are 
bound to the relevant subterms and then used to pass them into the 
predicates (see Figure 5). In the above example, we thus get the 
annotations SC = a ln i t [i] = init, Inv = Vl<j<i- awlj] = init, 
and Post = V l<j<N- aw[j] = init, as expected. 



schema ( f or_f or_assign_lin 
, init 
, def (A) 

, (for (/ := 0 to N)@ IndexI do 
(for U := 0 to _)@ IndexJ do 

(((A[I*N’+J])@ AIJ : : N+1=N’) := . I A) <- &(post SC) 
) «- &(inv InvJ, post PostJ) 

) «- &(inv Invl, post PostI) 

, default 

, □ 

, [safe (init, AIJ, SC), 

ind.schema ( step2 , SC, [IndexI, IndexJ], 
[Invl, PostI, InvJ, PostJ])] 

) ■ 


Figure 6. Annotation schema for_for_assign_lin 

4.2 Induction Schemas 

Schemas can make use of arbitrary specialized meta-programming 
in order to construct annotations but. in general, most annotations 
encapsulate general induction principles, so we use a generic pred- 
icate ind.schema to construct them. This takes the form of in- 
duction to use. the base formula (usually the safety predicate on the 
hot variable), and the indices (i.e., bound variables and bounds) to 
induct over, and returns the list of annotations. 

Several types of induction are currently supported. The schemas 
discussed here use single- (stepl) and doubly-nested induction 
(step2), which constructs the necessary inner and outer invariants 
and post-conditions, and a schema that handles diagonal matrix 
traversals diag. 

We keep the induction schemas separate from the annotation 
schemas themselves for two main reasons. First, the induction 
schemas encapsulate general induction principles that work for 
multiple annotation schemas so that very few of them are needed. 
Second, an annotation schema does more than an induction schema. 
The latter just constructs some annotations, but the former says 
where to put those annotations, how to pre-process the pattern, 
under what conditions (i.e., guards) they should apply, and whether 
there are any dependent variables. 

4.3 Pattern Pre-processing 

Often, even auto-generated code does not exactly fit the pat- 
tern specified in a schema, but contains “junk”, i.e., additional 
statements that are irrelevant to the current hot variable. Such 
junk can be part of the original program structure, or it can be 
introduced by optimizations (e.g., loop-invariant computations 
that are hoisted out of an inner loop). Consider for example the 
f or.f or.assign.lin schema shown in Figure 6, which an- 
notates two nested for-loops initializing a single matrix A that is 
represented as a vector; here, the constraint N + 1 = N' sym- 
bolically evaluates whether the multiplier N' has the right value. 
This schema should also apply in situations where the outer loop 
contains additional statements before or after the inner loop, and 
similarly for the inner loop, e.g., if, as the result of a loop fusion, 
two matrices are initialized at the same time. 

Extending the schema to cover these cases requires two steps. 
First, the junk statements need to be “matched away”, which can be 
achieved by adding list wildcards to the arguments of the statement 
patterns. Some care must be taken to ensure that these do not 
conflict with the proper pattern; we thus add additional constraints 
to the wildcards (see Figure 7). Flowever. the junk fragments can 
also contain statements that match barrier patterns and thus require 
annotations as well. These fragments will not be annotated during 
the CFG traversal because they have become part of the definitions. 


(for (/ := 0 to N)@ IndexI do { 

(* 1> for J := 0 to _ do { 

(* 1 (((A[I*N’+J])@ AIJ : : N+1=N’) := _ 1 A)) ; 

(((A[I*N’+J])@ AIJ : : N+1=N’) := _ 7* A) 

* 

}) «- &&(inv Invl) ; 

(for U := 0 to _)@ IndexJ do { 

(* h (((AP*N’+J])@ AIJ : : N+1=N’) := _ 7> A) 

) «- &&(inv Invl A InvJ) ; 

(((AP*N’+J])<3 AIJ : : N+1=N’) := . 1 A) <- &(post SC) 
* <- &&(inv Invl A InvJ A SC) 

}) <- &(inv InvJ, post PostJ) 

* *- &&(inv Invl A PostJ) 

}) «- &(inv Invl, post PostI) 


Figure 7. Pre-processed version of the f or.f or_assign_lin 
pattern 

Consequently, the junk fragments must in the second stage be 
annotated by the definition schema as well. 

The entire process can be automated because the annotations re- 
quired for the different junk positions can be derived systematically 
from the annotations given in the original pattern using the notion 
of current annotation : 

• On entry to a loop pattern, the current annotation is set to the 
invariant attached to the loop (or to true, if no invariant is given), 
and its old value is saved. 

• On exit from a loop pattern, the current annotation is restored 
to the saved value, and the post-condition attached to the loop 
(if any) is added to it. 

• For any other pattern, the attached post-condition (if any) is 
added to it. 

The current annotation is then used to start annotating any barriers 
that are contained in the junk fragments. The annotation schema 
compiler simply keeps the current annotation while it pre-processes 
the patterns, and whenever it inserts a list wildcard to match junk 
fragments, it also splices in a recursive update (i.e., using the kk- 
operator) with the current annotation. Figure 7 shows the pattern 
that results from applying this default pre-processing to the pattern 
specified in Figure 6. Of course, the default can be overridden by 
specifying the full pattern. 

The definition of current annotations, and their use in the junk 
fragments, reflects the role loop invariants play in the Hoare- 
calculus. Since the loop invariant contains all information required 
to prove the body, all irrelevant loops (i.e., barriers) in the body 
need to maintain it, and all relevant loops (i.e., nested loops) need 
to contain a complete invariant as well as a sufficient post-condition 
by themselves. 

4.4 Dependent Hot Variables 

The inference first passes over the program to determine the hot 
variables before it proceeds along every path from every hot use 
until either a definition or the beginning of the program is reached. 
Sometimes, however, a definition will trigger further hot variables 
that could not be (efficiently) detected on the first pass. This hap- 
pens. intuitively, when one variable is computed from another. For 
example, in the schema mtrans.int shown in Figure 8, the vari- 
able A is computed as the transpose of T, so that its frame depends 
on T’ s frame. 4 The mtrans.int schema uses a syntactic variant. 


4 Note that the schema has three nested post-conditions: the first (i.e., on the 
outer loop) states the element-wise definition of the transpose; the second 



schema (mtrans.int 
, frame 
, def (A) 

, (C := 0; 

((for (/ := 0 to N)@ IndexI do 
(for (J := 0 to M)@ IndexJ do { 

(((/4[I+N’*J])@ A1J : : N+1=N’) := T[C]); 

C++ 

}) *- &(inv C=J+N’ *I A FPre A InvJ, 

post C=M+1+N’*1 A FPre APostJ ) 

) *- &(inv C=N’*I A FPre A Invl, post FPre A PostI) 

) *- &(post FPre A A=trans(T ) ) 

) «- &(pre FPre, post FPost) 

, default 
, [T] 

, (] 

, [FPre = has_frame (T, dcm(F2, FI)), 
FPost = has_frame (A, dcm(Fl, F2)), 
ind_schema ( step2 , AI J=T [ J+N' *1 ] , 
[IndexI, IndexJ], 

[Invl, PostI, InvJ, Post J] ) ] 

) . 


Figure 8. Annotation schema mtrans.int 


where the additional (fifth) argument indicates that T is a depen- 
dent hot variable for A. Inference will thus proceed past this defini- 
tion for A, and restart, looking for a definition for the new hot vari- 
able T. Specifying the dependent hot variables is straightforward 
using the schemas, which shows the power of the approach. In the 
previous system version using manual annotation clauses, comput- 
ing the dependent hot variables could require the implementation 
of complex term decomposition. 

4.5 Schema Compiler 

Since we are building on AutoCert’s existing, large infrastruc- 
ture code base, the actual annotation schema compiler is surpris- 
ingly small — approximately 1000 lines of Prolog code. It provides 
two top-level functions, corresponding to the phases (i.e., CFG con- 
struction and traversal) of our analysis. Both functions take as in- 
put a list of annotation schemas, but not necessarily the same. The 
first function simply pre-processes the patterns and passes the re- 
sult into the CFG-construction. The second function is the compiler 
proper. For each schema, it produces a corresponding annotate 
clause that is called from the existing inference algorithm when it is 
trying to annotate a CFG-node (Section 2.3). Each clause consists 
of six general phases: ( i ) check that the program fragment cor- 
responding to the CFG-node matches the schema’s pre-processed 
pattern (this is necessary because the two phases can use different 
schemas); (ii) select the program fragment and bind the pattern’s 
meta- variables, including those introduced by pre-processing; (ii) 
evaluate the schema’s guards, to to ensure applicability; (iv) exe- 
cute the schema’s actions, to construct the annotations; (v) execute 
the update actions specified in the pattern; and finally, ( vi ) pro- 
cesses the dependent hot variables, if any are specified. In addition, 
the compiler also generates several auxiliary functions required by 
the inference algorithm, e.g., extracting the overall post-condition 
attached to a pattern. This is the same structure as the manually 
implemented annotation clauses, which is hardly surprising, since 
both are called in the same context. However, the schemas are sig- 
nificantly more compact and on average amount to only about 35% 


“lifts” this to an explicit transpose operator; and the third uses this to derive 
the appropriate frame information. 


of the manual versions, and the schema compiler eliminates the te- 
dious term-operations in steps (ii) and (v) above, which are also a 
source of errors that are difficult to trace. 

5. Evaluation 

We have evaluated the schema compiler and its interaction with 
AutoCert’s core inference engine on code generated by two in- 
house code generators. AutoFilter and AutoBayes, as well 
as a COTS generator, Real-Time Workshop, which generates code 
with distinct characteristics from several modeling languages. Here 
we look at code generated from Simulink and Embedded Matlab 
models. 

5.1 AutoBayes and AutoFilter 

We originally developed the annotation schema compiler for use 
with our AutoBayes and AutoFilter generators. Both generate 
numerical code that uses many vector and matrix operations, and 
has complex control flow. 

For AutoBayes, we use three different program versions 
segiripy) generated from the same model, by using different 
initialization methods for an iterative clustering algorithm. These 
programs have been applied to an image segmentation problem 
for planetary nebula images taken by the Hubble Space Telescope. 
They have been used in our previous work on annotation infer- 
ence [6], which allows us to compare the results of the annotation 
schema compiler with manually implemented annotation code. 

For AutoFilter, we used a series of idealized models of the 
orbital dynamics of the Crew Exploration Vehicle using a simple 
aiding sensor for position and velocity. 5 orb assumes that the 
earth is a perfect ellipse and is formulated as a two-body problem 
using Kepler’s Faws [19]. AutoFilter generates Kalman filter 
based state estimation code from this, which estimates the state 
of the CEV front the sensor readings. orbj2 extends orb by 
adding so-called J2 perturbations. These are additional terms in the 
differential equations of the process model of the vehicle dynamics 
which account for irregularities in the earth’s gravitational field. 
orbj2 bleE represents the same model but where the generator 
is configured to select a different algorithm, namely the Bierman 
measurement update. This uses LU matrix decomposition in order 
to represent matrices in a more numerically stable form. Generating 
code for theses models required extension to AutoFilter, which 
rendered obsolete the manually implemented annotation clauses 
used in our previous work. 

Initialization Safety Table 1 shows the results of applying the 
inference engine for the init- safety property to the code generated 
from the above models by AutoFilter and AutoBayes. 

The first two columns give the size of the generated programs 
and the size of the inferred annotations, which are as large as, and 
in some cases substantially larger than, the program itself. The third 
column gives the number of definition patterns used to generate the 
annotations for each program. This is, in contrast, quite small — 
in each case here, only either 2 or 3 patterns are required to han- 
dle the programs. This is partly because the junk mechanism al- 
lows a single high-level pattern to capture much of the variability 
present in the code, and confirms our intuition that our pattern lan- 
guage is a highly concise means of encapsulating the knowledge 
required to prove safety properties. In total, we needed only 8 and 
6 schemas for each of AutoBayes and AutoFilter, respective- 
ly, to formalize initialization safety; 5 of these are shared between 
both systems. Translating the existing manually implemented an- 
notation clauses into new schemas was straightforward. Adapting 

5 These models were developed by the first author together with Johann 
Schumann, and are based on a model of the orbital coasting mode of the 
space shuttle developed by the second author. 




Table 1. Annotation inference: results for iwf-property 



Table 2. Annotation inference: results for array - property 

the system to the new orb- and orb j 2-code required only very 
few iterations to get the anntations right and the VCs proven. This 
process was much simpler and faster than in the old approach using 
the manually implemented annotation clauses. 

The next column gives the number of verification conditions 
generated from the annotated program. The additional algorithmic 
complexity for orb j2 b i er is reflected in a substantially larger num- 
ber of VCs, although it requires only one more pattern. The sub- 
sequent columns lists the times taken to infer the annotations, to 
apply the VCG (which includes simplification) and to prove the 
VCs. Inference time is clearly negligible in comparison to prover 
time, which dominates the overall run-time. 6 
Array Safety Table 2 shows the results of applying the inference 
engine for the array safety property to the same models and gener- 
ator configurations. This property is significantly simpler than mil, 
and this is reflected in both the number of definition patterns, and 
the number of VCs. In fact, for most of the cases here, there are no 
definitions required. This is a consequence of no uses being desig- 
nated hot [6], There are, nevertheless, still some annotations gener- 
ated (simple loop bounds which do not require patterns). In several 
cases, the VCs are simplified away entirely before the prover phase. 

The only cases which require definition patterns are segm 2 
and segm 3 , which make use of array indirection, and so require 
annotations to give bounds on the values of matrix elements. Each 
example requires a single schema which was again straightforward 
to formulate. 

5.2 Real-Time Workshop: Simulink 

We used AutoCert to generate a customized verifier for showing 
frame safety of C code generated from Simulink models by Real- 
Time Workshop. We then used this verifier on a navigation sub- 
system currently under commercial development for NASA, which 
transforms the coordinate frames of various signals. The signals 
represent state information using quaternions and the software con- 
verts the quaternions to and from DCMs, so that matrix algebra 
can be used to perform the transformation. Several DCMs (NED- 
to-Nav, NED-to-ECEF, and ECI-to-ECEF) are constructed directly 

6 All times here are wall-clock times in seconds, measured on an otherwise 
idle 2.2GHz standard PC with 3GB RAM running Red Hat Enterprise Linux 
WS release 4. We used the SSCPA system [18] to run the E (version 0.999) 
[17] and SPASS (version 3.0c) [21] theorem provers in parallel. 


schema (dcm_ned_ecef 
, frame 
, def (A) 

, ((A[0] :=a;0) : : x0~= -cos (L) * sin(P); 

(A[l] := arl) : : xl~= -sin (L) *sin(P); 

(A[ 2]:=x2) : : x2~= cos(P); 

(A[3] :=x3) : : x3 ~= -sin (L); 

(A[4] :=a;4) : : x4 ~= cos (L); 

(A[5] :=cc5) : : x5 ~= 0; 

(A[6] :=x6) : : x6 ~= -cos (L) * cos(P); 

(A [7 ] :=x7) : : x7 ~= -sin(L) * cos(P); 

(A[8] :=a:8) : : *8 ~= -sin(P) 

) <- &(post hasjrame(A, dcmfned, ecef)), 

pre 3 A, rf) ■ has_unit(A, geolong) A has_unit(0, geolat) 
A *0 = -cos A sin </> A xl = -sin A sin (j> 

Ax2 = cos <j> A *3 = -sin A A x4 = cos A A i5 = 0 
A x6 = -cos A cos 4>Ax7 — -sin A cos 4> Ax8~ -sin <j>) 

, none 

, [] 

, [] 

) ■ 


Figure 9. Annotation schema dcm_ned_ecef 



Table 3. Annotation inference: results for/rame-property 

using standard trigonometric formulas and taking various physical 
quantities either as input from the signals or as constants, namely, 
geodetic latitude, longitude, time, true heading, platform azimuth, 
and the Earth’s rotational velocity [19]. 

nav3 and nav5 represent two different conceptual components 
of the navigation subsystem that carry out specific transformations. 
nav52 is generated from an equivalent model to nav5, but using 
different Real-Time Workshop configuration settings; consequent- 
ly, the generated code is quite different. There are numerous other 
subsystems not discussed here that use the same basic components. 
nav3 and nav5 were chosen to minimize functional overlap, so 
that they actually comprise most of the blocks in the subsystem. In 
all cases, AutoCert was provided with assumptions on the frames 
and physical units of the input signals, and the the aim of the ver- 
ification was to establish that the output, a quaternion state vector, 
is in the correct coordinate frame. Table 3 shows the results. Note 
that the size for nav3 and nav5 includes both components, since 
Real-Time Workshop merges them into a single program. 

nav5 and nav52 use the schema dcm_ned_ecef shown in 
Figure 9, whereas nav3 uses a similar schema dcm_ned_nav (not 
shown here but see Figure 3 for the structure of the required DCM). 
In total, frame safety requires 15 schemas. Of these, 7 describe 
specific transformations like dcm_ned_ecef , which could be tran- 
scribed directly from the literature. The remaining schemas formal- 
ize the effects of the applied matrix operations, including some of 
Matlab’s built-in functions. 

The proof times shown in Table 3 are substantially longer than 
for init and array, reflecting the more complex mathematical rea- 
soning that is required. Discharge of these VCs require a logical 
theory of matrix and frame algebra but this is orthogonal to the de- 
velopment of the schemas, and is not discussed here. As before, the 
inference time is negligible in comparison to prover time. 






5.3 Real-Time Workshop: Embedded Matlab 

Embedded Matlab is a mathematical scripting language which al- 
lows the use of functions and equations in models. Variables in the 
equations typically represent vectors and matrices and therefore the 
generated code is heavily loop-based, and quite different in charac- 
ter from code generated front “pure” Simulink. 

Here we illustrate the init certification of code generated from 
an Embedded Matlab model consisting of four matrix equations 
from a Kalman filter. The generated code is about 150 LOC and 
could be certified using just two schemas, of which one could even 
be reused front AutoBayes/ AutoFilter. The other schema 
needed, f or_f or_assign_lin (cf. Figures 1(c) and 6), is spe- 
cific to Embedded Matlab. Below we show the fragment which 
uses the two schemas, including the annotations generated by the 
schemas; we omitted constraints on the loop variables to simpli- 
fy the presentation. Here, the array x, which represents a matrix, 
is first assigned via a doubly-nested for-loop and then inverted 
via a sequence of assignments. Since inference works backwards 
through the CFG, the assignment sequence is first annotated. By 
setting x as its own dependent variable, inference can then proceed 
on to the loop. 

for iO := 0 to N 

invVO <i<i0, 0<j <N-Ximt[i + j * 2] = init do 

for il := 0 to N 

inv V0< i,j <N-(i < iO V (i = iO A j < il)) 

=> Xi m,[i + j * 2] = init do 

xll:= 0; 
for i2 := 0 to N 

inv V0<i, j <N-x11mi = init A 

(i<i0 V (i = i0 A j < il))=A *»[* +j * 2] = init do 
xll +:= bv0[i2+il*2] * dv0[i0+i2*2]; 
x[i0+i 1*2] := xll+R[iO+il*2]; 
postV0<i<i0, 0< j< N-XM,[i + j * 2] = init 
post V 0 < i, j < iO -Xi m,[i + j * 2] = init 
xll := x[0]; 

d := x[l]*x[2] - xll*x[3]; 

x[0] := x[3] / d; 

x[3] := xll / d; 

x[l] := -x[l] / d; 

x[2] := -x[2] / d; 

post V 0 < i < 3 -aiimtti] = init 

5.4 Optimizing Generators 

One of the advantages of the annotation schemas is their ability to 
specify patterns at a high-level and let the machinery handle the 
variability in the code. Since we consider the code generator as a 
black box, and make no assumptions about the way the code is 
generated, but only rely on its final form, our approach is also ap- 
plicable, therefore, to optimizing code generators. In particular, the 
existing patterns are — in combination with the default pattern pre- 
processing — insensitive to many commonly applied optimizations, 
including common subexpression elimination, loop hoisting, and 
loop fusion. 

We have exploited this to handle optimizations in the Real- 
Time Workshop generators for Simulink and Embedded Matlab. 
Consider for example the unoptimized fragment on the left, which 
is optimized (using loop hoisting and loop fusion) as shown on the 
right: 

for i := 1 to N do for i := 1 to N do 

for j := 1 to M do v := l/i*i; 

a[i,j] := l/i*i; for j := 1 to M do 

for i := 1 to N do a[i,j] := v; 

forj:=ltoMdo b[i,j] := v+1; 

b[i,j] := a[i,j]+l; 


In both cases, the f or_f or.assign schema (which is a gen- 
eralization of the for_assign schema to nested loops) is ap- 
plicable. The reason that for_f or.assign remains insensitive 
to the optimization is the list wildcard patterns added during pre- 
processing. These absorb the code fragments introduced or moved 
into a new location by the optimizations. In the unoptimized case, 
each pair of loops will become a definition node for the respective 
initialized variable (with the other pair becoming a barrier node), 
and the list wildcards will be set to empty. In the optimized case, 
the fused loops will become the definition for both variables, and 
the list wildcards will be matched against the assignments to v and 
the other array- variable. Note that this causes the program fragment 
(i.e., the fused loop) to be annotated multiple times (with different 
annotations), but this is also possible for unoptimized code. In the 
case of Embedded Matlab. the f or.f or_assign_lin schema is 
also able to absorb the effects of these optimizations in the same 
way. 

6. Related Work 

Annotation inference, or invariant generation, is an active research 
area. Approaches use both static and dynamic program analysis 
methods, and can further be distinguished according to the cate- 
gory of the inferred annotations: we can contrast type annotations, 
where properties are checked by special type systems, with logi- 
cal annotations, which are usually processed by a VCG and then a 
general-purpose theorem proven Our work is in the latter catego- 
ry. However, generally these approaches hard-code specific domain 
knowledge and cannot be customized simply, if at all, in the same 
way our approach allows it. 

Early approaches [7, 20] are based on predicate propagation and 
use inference rales similar to a strongest post-condition calculus 
to push an initial logical annotation forward through the program. 
Loops are handled by a combination of different heuristics until 
a fixpoint is achieved. However, these methods still need an initial 
annotation, and unlike our approach, the loop handling still induces 
a search space at inference time. Moreover, the constructed annota- 
tions are often only candidate invariants and need to be validated (or 
refuted) during inference, because they increase the search space. 

Kovacs and Jebelean [12] use techniques front algebraic com- 
binatorics and polynomial algebra to compute polynomial relations 
between variables that are assigned to within loops. These relations 
are then turned into annotations and supplied to a VCG. The aim 
is to characterize the behavior of loop variables in order to prove 
the functional correctness of numeric procedures. They are able to 
precisely characterize the class of loops for which they can infer 
annotations, although users must manually add any non-algebraic 
assertions (e.g., inequalities) which are required. Abstract interpre- 
tation has also been used to infer annotations in separation logic 
for pointer programs [13] although the techniques required there 
are fairly specialized and elaborate compared to our patterns. 

AOP is usually concerned with dynamic properties of programs 
but [15] gives a language, inspired by description logic, for de- 
scribing static properties of programs. Their pattern language has 
some similarities to ours, but is used to define pointcuts that match 
against violations of design rales, and the advice is simply the cor- 
responding error message. Since they are concerned with localizing 
errors, there is no need to infer annotations or propagate informa- 
tion throughout the program. Our pattern language also captures 
static properties but, in contrast, is essentially used to match against 
fragments which establish the specified property. 

Antkiewicz et al. [3] use code queries, which are approxima- 
tions to structural and behavioral patterns, in order to reverse engi- 
neer framework-specific models from framework code. It is similar 
to our work in the sense that we use patterns to reverse engineer 
“logical structure”. 



Generate-and-test use a fixed pattern catalogue to construct can- 
didate annotations and then try to validate (or refute) them, using 
static or dynamic methods. Houdini [10] is a static generate-and- 
test tool that uses ESC/Java to statically refute invalid candidates. 
Houdini starts with a candidate set for the entire program and then 
iterates until a fixpoint is reached. This increases the computation- 
al effort required, and in order to keep the approach tractable, the 
pattern catalogue is deliberately kept small. Hence, Houdini is in- 
complete, and acts more as a debugging tool than as a certification 
tool. Daikon [8] is a dynamic annotation inference tool. Its tester 
accepts all candidates that hold without falsification but with a suf- 
ficient degree of support over the test suite. In order to verify the 
candidates. Daikon has also been combined with ESC/Java [16]. 
However, like all dynamic annotation generation techniques, it re- 
mains incomplete because it relies on a test suite to generate the 
candidates and can thus miss annotations on paths that are not exe- 
cuted often enough. 

Finally, the specific problem of frame safety has been addressed 
by Lowry et al. [14], who used a domain-specific type system to 
verify the safety of abstract geometric calculations. The language 
analyzed was quite simple, however, so that annotations could be 
restricted to inputs with no need for the inference of patterns or 
intermediate annotations. Although the underlying domain knowl- 
edge is similar to what we use for the frame example, in contrast to 
our “retargetable verifier”, this is a very specific solution. 


7. Conclusions and Future Work 

We have presented a declarative annotation schema language and 
a schema compiler which, together with a generic annotation in- 
ference engine, forms the AutoCert system. We have devel- 
oped a set of schemas which customizes AutoCert for certify- 
ing the frame safety of navigation code generated from Simulink 
models. Other sets of schemas support the certification of Embed- 
ded Matlab-generated code, as well as the entire range of models 
and configurations (i.e., algorithmic variants and optimizations) for 
AutoB AYES and AutoFilter. This continues the work begun in 

[6] and represents a significant advance in both power and expres- 
sivity of the technique. 

By raising the level of abstraction at which annotation knowl- 
edge is expressed, we are able to concisely capture many variations 
of the underlying code idioms. In particular, we can easily deal with 
optimizations which obscure low-level code structure. 

We are developing additional sets of schemas and extending the 
schema language itself to support the certification of code generat- 
ed from a wider range of models. There are various physical and 
geometric properties that can be analyzed similarly to coordinate 
frames, such as the correct use of Euler angles, quaternion handed- 
ness, and so on, and we plan to adapt the frame schemas for those 
properties. 

Finally, although our emphasis so far has been on certifying 
safety, the schema language and inference engine are not limited 
to this and, in fact, several of the schemas we have presented 
here are actually verifying full functional correctness of certain 
fragments in order to establish safety. For example, in order to 
verify frame safety for the examples above, we need to verify the 
correctness of the underlying matrix transformations. Similarly, the 
various DCM schemas are effectively functional verifications of 
those constructions. We intend to further explore the possibilities 
for functional verification. 
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